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Variations on a theme by Schalkwijk and Kailath 

Robert G. Gallager Bari§ Nakiboglu 



Abstract 

Schalkwijk and Kailath (1966) developed a class of block codes for Gaussian channels with ideal feedback 
for which the probability of decoding error decreases as a second-order exponent in block length for rates below 
capacity. This well-known but surprising result is explained and simply derived here in terms of a result by Elias 
(1956) concerning the minimum mean-square distortion achievable in transmitting a single Gaussian random variable 

■ over multiple uses of the same Gaussian channel. A simple modification of the Schalkwijk-Kailath scheme is then 
I shown to have an error probability that decreases with an exponential order which is linearly increasing with block 

■ length. In the infinite bandwidth limit, this scheme produces zero error probability using bounded expected energy 
OA \ at all rates below capacity. A lower bound on error probability for the finite bandwidth case is then derived in which 

^ ■ the error probability decreases with an exponential order which is linearly increasing in block length at the same 

O ' rate as the upper bound. 

5 ! I. Introduction 

This note describes coding and decoding strategies for discrete-time additive memoryless Gaussian-noise (DAMGN) 
^ channels with ideal feedback. It was shown by Shannon [14] in 1961 that feedback does not increase the capacity 
HH of memoryless channels, and was shown by Pinsker [10] in 1968 that fixed-length block codes on Gaussian- 
^ noise channels with feedback can not exceed the sphere packing bound if the energy per codeword is bounded 
independently of the noise realization. It is clear, however, that reliable communication can be simplified by the 
use of feedback, as illustrated by standard automatic repeat strategies at the data link control layer. There is a 
^ substantial Literature (for example [11], [3], [9]) on using variable-length strategies to substantially improve the rate 
Q^ of exponential decay of error probability with expected coding constraint length. These strategies essentially use 
^ the feedback to coordinate postponement of the final decision when the noise would otherwise cause errors. Thus 
small error probabilities can be achieved through the use of occasional long delays, while keeping the expected 
, delay small. 

^ ' For DAMGN channels an additional mechanism for using feedback exists whereby the transmitter can transmit 
OO unusually large amplitude signals when it observes that the receiver is in danger of making a decoding error. The 
power {i.e., the expected squared amplitude) can be kept small because these large amplitude signals are rarely 
required. In 1966, Schalkwijk and Kailath [13] used this mechanism in a fixed-length block-coding scheme for 
^ infinite bandwidth Gaussian noise channels with ideal feedback. They demonstrated the surprising result that the 
^ resulting probability of decoding error decreases as a second order exponential in the code constraint length at all 
transmission rates less than capacity. Schalkwijk [12] extended this result to the finite bandwidth case, i.e., DAMGN 
channels. Later, Kramer [8] (for the infinite bandwidth case) and Zigangirov [15] (for the finite bandwidth case) 
showed that the above doubly exponential bounds could be replaced by feth order exponential bounds for any k > 2 
in the limit of arbitrarily large block lengths. Later encoding schemes inspired by the Schalkwijk and Kailath 
approach have been developed for multi-user communication with DAMGN [16], [17], [18], [19], [20], secure 
communication with DAMGN [21] and point to point communication for Gaussian noise channels with memory 
[22]. 

The purpose of this paper is three-fold. First, the existing results for DAMGN channels with ideal feedback are 
made more transparent by expressing them in terms of a 1956 paper by Elias on transmitting a single signal from 
a Gaussian source via multiple uses of a DAMGN channel with feedback. Second, using an approach similar to 
that of Zigangirov in [15], we strengthen the results of [8] and [15], showing that error probability can be made 
to decrease with blocklength n at least with an exponential order an — h for given coefficients a > and 6 > 0. 

'For integer > 1, the fcth order exponent function gk(x) is defined as gk{x) = exp(exp(- • • (exp(a;)) • ■ ■ )) with k repetitions of exp. A 
function f{x) > is said to decrease as a fcth order exponential if for some constant A > and all sufficiently large x, f{x) < l/gk{Ax). 
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Fig. 1. The setup for n channel uses per source use with ideal feedback. 



Third, a lower bound is derived. This lower bound decreases with an exponential order in n equal to an + b' (n) 
where a is the same as in the upper bound and b'{n) is a sublinear function]^ of the block length n. 

Neither this paper nor the earlier results in [12], [13], [8], and [15] are intended to be practical. Indeed, these 
second and higher order exponents require unbounded amplitudes (see [10], [2], [9]). Also Kim et al [7] have 
recently shown that if the feedback is ideal except for additive Gaussian noise, then the error probability decreases 
only as a single exponential in block length, although the exponent increases with increasing signal-to-noise ratio 
in the feedback channel. Thus our purpose here is simply to provide increased understanding of the ideal conditions 
assumed. 

We first review the Ehas result [4] and use it to get an almost trivial derivation of the Schalkwijk and Kailath 
results. The derivation yields an exact expression for error probability, optimized over a class of algorithms including 
those in [12], [13]. The linear processing inherent in that class of algorithms is relaxed to obtain error probabilities 
that decrease with block length n at a rate much faster than an exponential order of 2. Finally a lower bound to 
the probability of decoding error is derived. This lower bound is first derived for the case of two codewords and 
is then generalized to arbitrary rates less than capacity. 



II. The feedback channel and the Elias result 

Let Xi, . . . , Xn = X" represent n > 1 successive inputs to a discrete-time additive memoryless Gaussian noise 
(DAMGN) channel with ideal feedback. That is, the channel outputs Yi,... ,Yn = satisfy = XJ" + 
where Z" is an n-tuple of statistically independent Gaussian random variables, each with zero mean and variance 
(t|, denoted M{0, cj^). The channel inputs are constrained to some given average power constraint 5 in the sense 
that the inputs must satisfy the second-moment constraint 

1 " 

- y 5i < 5 where Si = E[Xf]. (1) 

n ^-^ 

1=1 

Without loss of generality, we take a"^ = 1. Thus S is both a power constraint and a signal-to-noise ratio constraint. 

A discrete-time channel is said to have ideal feedback if each output li, 1 < i < n, is made known to the 
transmitter in time to generate input Xj+i (see Figure [T]). Let C/i be the random source symbol to be communicated 
via this n-tuple of channel uses. Then each channel input Xi is some function f{Ui,Y\-^) of the source and 
previous outputs. Assume (as usual) that Ui is statistically independent of Z". 

Elias [4] was interested in the situation where Ui ~ J\f{0,af) is a Gaussian random variable rather than a 
discrete message. For n = 1, the rate-distortion bound (with a mean-square distortion measure) is achieved without 
coding or feedback. For n > 1, attempts to map Ui into an n dimensional channel input in the absence of feedback 
involve non-linear or twisted modulation techniques that are ugly at best. Using the ideal feedback, however, Elias 
constructed a simple and elegant procedure for using the n channel symbols to send Ui in such a way as to meet 
the rate-distortion bound with equality. 

Let Si = E[X?] be an arbitrary choice of energy, i.e., second moment, for each i, 1 < i < n. It will be shown 
shortly that the optimal choice for Si,...,Sn, subject to ([Til, is S'j = S" for 1 < i < n. Elias's strategy starts 
by choosing the first transmitted signal Xi to be a linear scaling of the source variable Ui, scaled to meet the 
second-moment constraint, i.e., 



2i.e. lim ^ = 
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At the receiver, the minimum mean-square error (MMSE) estimate of Xi is E[Xi|Yi] = j^^, and the error in that 
estimate is Af{0, j^^)- It is more convenient to keep track of the MMSE estimate of Ui and the error U2 in that 
estimate. Since Ui and Xi are the same except for the scale factor al/^/Sl, these are given by 

Ep.in, . (2) 

U2 = Ui-E[Ui\Yi] (3) 

where U2 ~ M{0, a\) and cr| = 

Using the feedback, the transmitter can calculate the error term U2 at time 2. Elias's strategy is to use U2 as 
the source signal (without a second-moment constraint) for the second transmission. This unconstrained signal C/2 
is then linearly scaled to meet the second moment constraint 52 for the second transmission. Thus the second 
transmitted signal X2 is given by 



We use this notational device throughout, referring to the unconstrained source signal to be sent at time i by C/j 
and to the linear scaling of Ui, scaled to meet the second moment constraint Si, as Xi. 

The receiver calculates the MMSE estimate E[C/2|^] — ~^x^^~^ and the transmitter then calculates the error in 
this estimate, 1/3 = 112- E[U2\Y2]. Note that 

c/i = c/2 + E[c/i|yi] 
= i73 + E[c/2|y2] + E[[7i|yi]. 

Thus Us can be viewed as the error arising from estimating Ui by E[J7i|yi] + E[[/2|l2]- The receiver continues 
to update its estimate of C/i on subsequent channel uses, and the transmitter continues to transmit Unearly scaled 
versions of the current estimation error. Then the general expressions are as follows: 

X, = ^^-^ ; (4) 

mm = ^f^; (5) 

Ui+i = U-E[Ui\Yi\. (6) 

where t/j+i ~ J\f{0, 

Iterating on equation ([6]) from i = 1 to n yields 

n 

Un+i = Ui-Y,E[Ui\Yi]. (7) 

i=l 

Similarly, iterating on af_^^ = erf /{I + Si), we get 



This says that the error arising from estimating Ui by Ya=i ^[Ui\Yi\ is AA(0, c^+i). This is vaUd for any (non- 
negative) choice of ^i, . . . , 5„, and this is minimized, subject to Yl^=i = ''^S, hy Si = S for 1 < i < n. With 
this optimal assignment, the mean square estimation error in Ui after n channel uses is 

2 

^n+l = + • (9) 

We now show that this is the minimum mean-square error over all ways of using the channel. The rate-distortion 
function for this Gaussian source with a squared-difference distortion measure is well known to be 
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This is the minimum mutual information, over all channels, required to achieve a mean-square error (distortion) 
equal to d. For d = al/{l + 5)", R{d) is §ln(l + S), which is the capacity of this channel over n uses (it 
was shown by Shannon [14] that feedback does not increase the capacity of memoryless channels). Thus the 
Elias scheme actually meets the rate-distortion bound with equality, and no other coding system, no matter how 
complex, can achieve a smaller mean-square error. Note that Q is also valid in the degenerate case n = I. What 
is surprising about this result is not so much that it meets the rate-distortion bound, but rather that the mean-square 
estimation error goes down geometrically with n. It is this property that leads directly to the doubly exponential 
error probability of the Schalkwijk-Kailath scheme. 



III. The Schalkwijk-Kailath scheme 

The Schalkwijk and Kailath (SK) scheme will now be defined in terms of the Elias scheme]^ still assuming the 
discrete-time channel model of Figure [T] and the power constraint of The source is a set of M equiprobable 
symbols, denoted by {1,2, . . . ,M}. The channel uses will now be numbered from to n — 1, since the use at 
time will be quite distinct from the others. The source signal, Uq is a standard M-PAM modulation of the source 
symbol. That is, for each symbol m, 1 < m < M, from the source alphabet, m is mapped into the signal am where 
am. = rn — (M+l)/2. Thus the M signals in Uq are symmetric around with unit spacing. Assuming equiprobable 
symbols, the second moment CTq of Uq is (M^ — 1)/12. The initial channel input Xq is a linear scaling of Uq, scaled 
to have an energy to be determined later. Thus Xq is an M-PAM encoding, with signal separation do = \/So/o"o- 




The received signal Yq = Xq + Zq is fed back to the transmitter, which, knowing Xq, determines Zq. In the 
following n — I channel uses, the Ehas scheme is used to send the Gaussian random variable Zq to the receiver, 
thus reducing the effect of the noise on the original transmission. After the n — 1 transmissions to convey Zq, the 
receiver combines its estimate of Zq with Yq to get an estimate of Xq, from which the Af-ary signal is detected. 
Specifically, the transmitted and received signals for times l<i<n — lare given by equations (01), (|5]l and 
At time 1, the unconstrained signal Ui is Zq and = E[[/^] = 1. Thus the transmitted signal Xi is given 
by y/SiUi, where the second moment is to be selected later. We choose = S*! for 1 < i < n — 1 for 
optimized use of the EUas scheme, and thus the power constraint in ^ becomes Sq + {n — l)Si = nS. At the end 
of transmission n — 1, the receiver's estimate of Zq from Yi, . . . , Yn-i is given by (|7]) as 

n-l 

E[Zq\Y^,-^] = Y,E[U,\Y,]. 



n' 



The error in this estimate, [/„ = Zq — E[Zo | Y" ], is a zero-mean Gaussian random variable with variance a. 
where is given by ^ to be 

Since ^0 = ^0 + Zq and Zq = E[Zq\ Y^'^] + C/„ we have 

yo-E[Zo I Y^i] =Xo + [/„ (12) 

where Un ~ AA(0, cr^). 

Note that C/„ ~ AA(0, cj^) is a function of the noise vector Zq~^ and is thus statistically independenj^ of Xq. 
Thus, detecting Xq from Yq — E[Zq \ Y"~^] (which is known at the receiver.) is the simplest of classical detection 
problems, namely that of detecting an M-PAM signal Xq from the signal plus an independent Gaussian noise 
variable [/„. Using maximum likelihood detection, an error occurs only if Un exceeds half the distance between 

^The analysis here is tutorial and was carried out in slightly simplified form in [5, p481]. A very readable further simplified analysis is 
in [23]. 

''Furthermore, for the given feedback strategy, Gaussian estimation theory can be used to show, first, that U„ is independent of E[Zo | 
Y" -^], and, second, that Y ^ Yo - E[Zo | Y""^] is a sufficient statistic for Xo based on Yq'^, (i.e. Pr[Xo | Yq"^] = Pr[Xo | Y]). Thus 
this detection strategy is not as ad hoc as it might initially seem. 
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signal points, i.e., if \Un\ > ^ ~ 1 y Ip^-i ■ Since the variance of C/„ is (1 + Si) the probabiUty of error 
is given b}0 

Pe = 2^^^Q(7n) (13) 



1 I \.'2S (\-\-S )^~-'- 

where 7^ = — °m2-i — ^^d Q{x) is the complementary distribution function of AA(0, 1), /.e., 



1 /-"^ .-^2 



Q{x) = -=l exp(— (14) 



Choosing S'o and S*!, subject S'o + (n— l)^! = nS, to maximize 7„ (and thus minimize Pe), we get 5i = max{0, 5— 
-}. That is, if nS is less than 1, all the energy is used to send Xq and the feedback is unused. We assume nS > 1 
in what follows, since for any given S > this holds for large enough n. In this case. So is one unit larger than 
Si, leading to 

Si = S--; So = 5i + l. (15) 

n 

Substituting ([B]) into (fT3l ). 

Pe = 2^^^Q(7n) (16) 



where 7n = y m^-i 

This is an exact expression for error probability, optimized over energy distribution, and using M-PAM followed 
el 

so, 

by M^. Thus, 



by the Elias scheme and ML detection. It can be simplified as an upper bound by replacing the coefficient ^^j^ by 
1. Also, since Q{-) is a decreasing function of its argument, Pf, can be further upper bounded by replacing — 1 



Pe < 2Q{jn) (17) 



1 (i+sy 



>/2 



Where 7n > \/3 (l - j —j^ 

For large M, which is the case of interest, the above bound is very tight and is essentially an equality, as first 
derived by Schalkwij]<| in Eq. 12 of [12]. Recalling that nS > 1 we can further lower bound 7„ (thus upper 
bounding Pe). Substituting C{S) = I ln(l + S) and M = exp{nR) we get 



7n > 



exp{n{C{S) - R)) (18) 



The term in brackets is decreasing in n. Thus, 

(1 - ihr^' > (1 - j^,)'/' (19) 

> e"^/2 Vn > 1 (20) 

Using this together with equations ([TT] ) and ([T8] ) we get, 

Pe < 2Q (.ftexp{n{C{S) - R))) , (21) 



or more simply yet, 

Pe < 2Q(exp[n(C(5) - R)]). (22) 

Note that for R < C{S), Pe decreases as a second order exponential in n. 

In summary, then, we see that the use of standard M -PAM at time 0, followed by the Elias algorithm over the 
next n — 1 transmissions, followed by ML detection, gives rise to a probability of error Pe that decreases as a 
second-order exponential for all R < C{S). Also Pg satisfies (|2TI ) and (l22l ) for all n > l/S". 

^The term(A/— 1)/A/ in l ll3t arises because the largest and smallest signals each have only one nearest neighbor, whereas all other signals 
have two nearest neighbors. 

^Schalkwijk's work was independent of Elias's. He interpreted the steps in the algorithm as successive improvements in estimating Xo 
rather than as estimating Zo- 
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Although Pg decreases as a second-order exponential with this algorithm, the algorithm does not minimize Pe 
over all algorithms using ideal feedback. The use of standard M-PAM at time could be replaced by PAM with 
non-equal spacing of the signal points for a modest reduction in Pe- Also, as shown in the next section, allowing 
transmissions 1 to n — 1 to make use of the discrete nature of Xq allows for a major reduction in PeEl 

The algorithm above, however, does have the property that it is optimal among schemes in which, first, standard 
PAM is used at time and, second, for each i, 1 < i < n — 1, Xi is a linear function of Zq and Y^^^. The reason 
for this is that Zq and Y""^^ are then jointly Gaussian and the Elias scheme minimizes the mean square error in 
Zq and thus also minimizes Pe- 



A. Broadband Analysis: 

Translating these results to a continuous time formulation where the channel is used 2W times per second]^ the 
capacity (in nats per second) is Cw = 2WC. Letting T = n/2W and letting Rw = 2WR be the rate in nats per 
second, this formula becomes 

Pe<2Q{exp[{Cw-Rw)T]). (23) 

Let V = 2WS be the continuous-time power constraint, so that Cw = l^ln(l + V/2W). In the broadband limit 
as W —>^ oo for fixed V, Cw —>■ V 12. Since ( [23] ) applies for all > 0, we can simply go to the broadband limit. 
Coo =V 12. Since the algorithm is basically a discrete time algorithm, however, it makes more sense to view the 
infinite bandwidth limit as a limit in which the number of available degrees of freedom n increases faster than 
linearly with the constraint time T. In this case, the signal-to-noise ratio per degree of freedom, S = VT/n goes 
to with increasing T. Rewriting 7„ in (VT\ for this case. 



7n > a/Sbxp 
> \/3 exp 



r? VT 1 

5in(i + — --^)-m 

2 n n 



VT 1 'P'^T'^ 



2 2 'in 



(24) 
(25) 



where the inequality ln(l + x) > x — x /2 was used. Note that if n increases quadratically with T, then the term 
is simply a constant which becomes negligible as the coefficient on the quadratic becomes large. For example, 
if n > eP^r^, then this term is at most 1/24 and simpUfies to 

7„ > exp [r(Coo - Poo)] for n > GV^T^ (26) 

This is essentially the same as the broadband SK result (see the final equation in [13]). The result in [13] used 
n = e^^*-^"' degrees of freedom, but chose the subsequent energy levels to be decreasing harmonically, thus slightly 
weakening the coefficient of the result. The broadband result is quite insensitive to the energy levels used for each 
degree of freedortJl, so long as 5*0 is close to 1 and the other Si are close to 0. This partly explains why the 
harmonic choice of energy levels in [13] comes reasonably close to the optimum result. 

^Indeed, Zigangirov [15] developed an algorithm quite similar to that developed in the next section. The initial phase of that algorithm is 
very similar to the algorithm [12] just described, with the following differences. Instead of starting with standard A/-PAM, [15] starts with 
a random ensemble of non-equally-spaced A/-PAM codes ingeniously arranged to form a Gaussian random variable. The Elias scheme is 
then used, starting with this Gaussian random variable. Thus the algorithm in [15] has different constraints than those above. It turns out to 
have an insignificantly larger Pe (over this phase) than the algorithm here for S greater than [(1/ In ^) — 1] and an insignificantly smaller 
Pe otherwise. 

^This is usually referred to as a channel bandlimited to W. This is a harmless and universally used abuse of the word bandwidth for 
channels without feedback, and refers to the ability to satisfy the Nyquist criterion with arbitrarily little power sent out of band. It is more 
problematic with feedback, since it assumes that the sum of the propagation delay, the duration of the transmit pulse, the duration of the 
matched filter at the receiver, and the corresponding quantities for the feedback, is at most 1/2W. Even allowing for a small fraction of 
out-of-band energy, this requires considerably more than bandwidth W. 

To see this, replace (1 + l ll3t by I exp[ ln(l + Si)], each term of which can be lower bounded by the inequality 

ln(l + x)>x~- 12. 
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Fig. 2. Given that am is the sample value of the PAM source signal Uo, the sample value of Xq is amdo where do = ^/So/ao. The 
figure illustrates the probability density of Yq given this conditioning and shows the M-PAM signal points for Xo that are neighbors to the 
sample value Xo = a,ndo- Note that this density is N{amdo, 1), i.e., it is the density of Zq, shifted to be centered at amdo. Detection 
using maximum likelihood at this point simply quantizes Yo to the nearest signal point. 



IV. An alternative PAM Scheme in the high signal-to-noise regime 

In the previous section, Elias's scheme was used to allow the receiver to estimate the noise Zq originally added 
to the PAM signal at time 0. This gave rise to an equivalent observation, Yq — E[Zo | Y"^^] with attenuated noise 
Un as given in ([T2l ). The geometric attenuation of E[C/^] with n is the reason why the error probability in the 
SchaUcwijk and Kailath (SK) [13] scheme decreases as a second order exponential in time. 

In this section, we explore an alternative strategy that is again based on the use of M -PAM at time 0, but is quite 
different from the SK strategy at times 1 to n — 1. The analysis is restricted to situations in which the signal-to-noise 
ratio (SNR) at time is so large that the distance between successive PAM signal points in Xq is large relative to 
the standard deviation of the noise. In this high SNR regime, a simpler and more effective strategy than the Elias 
scheme suggests itself (see Figure |2]l. This new strategy is limited to the high SNR regime, but Section IVl develops 
a two-phase scheme that uses the SK strategy for the first part of the block, and switches to this new strategy when 
the SNR is sufficiently large. 

In this new strategy for the high SNR regime, the receiver makes a tentative ML decision mo at time 0. As seen 
in the figure, that decision is correct unless the noise exceeds half the distance do = \fSo/(jQ to either the signal 
value on the right or the left of the sample value am of Uq. Each of these two events has probability Q{do/2). 

The transmitter uses the feedback to calculate ttiq and chooses the next signal Ui (in the absence of a second- 
moment constraint) to be a shifted version of the original M -PAM signal, shifted so that Ui = rfiQ — m where m 
is the original message symbol being transmitted. In other words, Ui is the integer-valued error in the receiver's 
tentative decision of Uq. The corresponding transmitted signal Xi is essentially given by Xi = Uiy^ Si/E[U'f], 
where Si is the energy allocated to Xi. 

We now give an approximate explanation of why this strategy makes sense and how the subsequent transmissions 
are chosen. This is followed by a precise analysis. Temporarily ignoring the case where either m = 1 or m = M 
(i.e., where am has only one neighbor), Ui is with probability 1 — 2Q{dQ/2). The probability that \Ui\ is two or 
more is essentially negligible, so Ui = ±1 with a probability approximately equal to 2Q{do/2). Thus 

E[Uf] « 2Q(do/2); Xi « 5^^^ (27) 

y'2Q[do/2) 

This means that Xi is not only a shifted version of Xq, but (since do = V^/o'o) is also scaled up by a factor 
that is exponential in ^o when Sq is sufficiently large. Thus the separation between adjacent signal points in Xi is 
exponentially increasing with Sq. 

This also means that when Xi is transmitted, the situation is roughly the same as that in Figure |2l except that 
the distance between signal points is increased by a factor exponential in Sq. Thus a tentative decision at time 1 
will have an error probability that decreases as a second order exponential in 5*0. 

Repeating the same procedure at time 2 will then give rise to a third order exponential in Sq, etc. We now turn 
to a precise analysis and description of the algorithm at times 1 to n — 1. 

The following lemma provides an upper bound to the second moment of Ui, which was approximated in (|27] ). 

Lemma 4.1: For any d > 4, let U he a d-quantization of a normal random variable Z ~ AA(0, 1) in the sense 
that for each integer £, if Z £ {d£ - ^,d£ + then U = £. Then £[[7^] is upper bounded by 

E[[/2]<lf exp[-f] (28) 
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Note from Figure |2] that, aside from a sUght exception described below, Ui = rho — m is the same as the 
do-quantization of Zq where do = ^/So/ao. The sUght exception is that mo should always lie between 1 and M. 
If Zq > (M — m + 1/2), then Ui = M — m, whereas the do-quantization takes on a larger integer value. There 
is a similar limit for Zo < 1 — m — 1/2. This reduces the magnitude of Ui in the above exceptional cases, and 
thus reduces the second moment. Thus the bound in the lemma also applies to C/i. For simplicity in what follows, 
we avoid this complication by assuming that the receiver allows tjiq to be larger than M or smaller than 1. This 
increases both the error probability and the energy over true ML tentative decisions, so the bounds also apply to 
the case with true ML tentative decisions. 

Proof: From the definition of U, we see that U = iif Z e {d£ - ^, dl + Thus, for £ > 1, 

Pr[U = i]=Q{d£-^)-Q{di+^) 
From symmetry, Fi[U = —I] = Pr[[/ = I], so the second moment of U is given by 

oo 

1=1 

= 2Q(d/2) + 2^[^2 _ 1 



Q{d£-^)-Q{d£+'^) 



<, - - ±)^] 

i=2 



Q{d£-^] 



Using the standard upper bound Q{x) < exp[— x^/2] for x > 0, and recognizing that £'^ — {£ — 1)^ = 2£—\, 

this becomes 



E[[/2] < <j exp[-dV8] + ^exp[-(2^ - lfd^/S\^ 

- l)dV8] 




(29) 
■ 

We now define the rest of this new algorithm. We have defined the unconstrained signal Ui at time 1 to be 
mo — m but have not specified the energy constraint to be used in amplifying Ui to Xi. The analysis is simplified 
by defining Xi in terms of a specified scaling factor between Ui and Xi. The energy in Xi is determined later by 
this scaling. In particular, let 

J2- 



Xi = diUi where di = \/8exp ( — ) 

Viey 



The peculiar expression for di above looks less peculiar when expressed as d^/8 = exp(do/8). When Yi = Xi + Zi 
is received, we can visualize the situation from Figure |2] again, where now do is replaced by di . The signal set 
for Xi is again a PAM set but it now has signal spacing di and is centered on the signal corresponding to the 
transmitted source symbol m. The signals are no longer equally likely, but the analysis is simplified if a maximum 
likelihood tentative decision m-i is again made. We see that m-i = m-o — Yi where Yi is the di -quantization of Yi 
(and where the receiver again allows mi to be an arbitrary integer) . We can now state the algorithm for each time 
I < i < n — 1. 

d, = ^/8exp(%) (30) 
Xi = diUi (31) 
rhi = mj_i - Yi (32) 
Ui+i = rhi-m. (33) 
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where % is the -quantization of Yi. 

Lemma 4.2: For do > 4, the algorithm of (I30l)-(l33l) satisfies the following for all alphabet sizes M and all 
message symbols m: 



d? ,d, 



mif) > 9i{2). (34) 

E[Xf] < 1^. (35) 

oo 

^E[X2] < 5. (36) 

i=l 

Pr(mi/m) < l/(7m(2), (37) 



where gi{x) = exp(- • • (exp(x)) • • • ) with i exponentials. 
Proof: From the definition of di in (l30l ). 



^2 //2 j2 ,2 

y = exp(-^) = exp(exp(-^)) = • • • = gi{ — ) 

This establishes the first part of (l34b and the inequality follows since do> A and is increasing in x. 
Next, since = diUi, we can use (l34l) and Lemma |4~T] to see that 

E[X2] = d2E[t/2] 



< 



£ . .\ f 1.6 . d? 

P( 

12.8 



8exp(::^)) (^exp(- 



di 



where we have canceled the exponential terms, establishing (1351) . 

To establish (|36l ). note that each di is increasing as a function of dg, and thus each E[Xj^] is upper bounded by 
taking do ^ 4 to be 4. Then E[X^] = 3.2, E[X|] = 1.6648, and the other terms can be bounded in a geometric 
series with a sum less than 0.12. 

Finally, to establish (|37] ). note that 

Pr(mi / m) = Pr(|C/,|2 > 1) < E[U^^^] 

(») 1 6 (^) 

< — exp(-d2/8) < exp(-df/8) 

yi/exp(5i(d2/8)) < l/5,+i(2), 

where we have used Lemma |4~T] in (a), the fact that dj > 4 in (6), and equation (l34l ) in (c) and (d). ■ 
We have now shown that, in this high SNR regime, the error probability decreases with time i as an ith order 
exponent. The constants involved, such as do > 4 are somewhat ad hoc, and the details of the derivation are 
similarly ad hoc. What is happening, as stated before, is that by using PAM centered on the receiver's current 
tentative decision, one can achieve rapidly expanding signal point separation with small energy. This is the critical 
idea driving this algorithm, and in essence this idea was used earlier b)0 Zigangirov [15] 



V. A TWO-PHASE STRATEGY 

We now combine the ShaUcwijk-Kailath (SK) scheme of Section JII] and the high SNR scheme of Section JV] 
into a two phase strategy. The first phase, of block length rii, uses the SK scheme. At time rii — 1, the equivalent 
received signal Yq — E[Zo | Y"^^^], (see ([T2l)). is used in an ML decoder to detect the original PAM signal Xq in 
the presence of additive Gaussian noise of variance o"^^ . 

'"However unlike the scheme presented above, in Zigangirov's scheme the total amount of energy needed for transmission is increasing 
linearly with time. 
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Note that if we scale the equivalent received signal, Yq — E[Zq \ Y"^~ ] by a factor of 1/(7^ so as to have an 
equivalent unit variance additive noise, we see that the distance between adjacent signal points in the normalized 
PAM is dm-i = 27^^ where 7^^ is given in (T3[ . If ni is selected to be large enough to satisfy dm-i > 4, then 
this detection at time ni — 1 satisfies the criterion assumed at time of the high SNR algorithm of Section JV] In 
other words, the SK algorithm not only achieves the error probability calculated in Section JIIJ but also, if the block 
length of the SK phase ni is chosen to be large enough, it creates the initial condition for the high SNR algorithm. 
That is, it provides the receiver and the transmitter at time ni — 1 with the output of a high signal-to-noise ratio 
PAM. Consequently not only is the tentative ML decision at time ni — 1 correct with moderately high probability, 
but also the probability of the distant neighbors of the decoded messages vanishes rapidly. 

The intuition behind this two-phase scheme is that the SK algorithm seems to be quite efficient when the signal 
points are so close (relative to the noise) that the discrete nature of the signal is not of great benefit. When the SK 
scheme is used enough times, however, the signal points becomes far apart relative to the noise, and the discrete 
nature of the signal becomes important. The increased effective distance between the signal points of the original 
PAM also makes the high SNR scheme, feasible. Thus the two-phase strategy switches to the high SNR scheme at 
this point and the high SNR scheme drives the error probability to as an n2 order exponential. 

We now turn to the detailed analysis of this two-phase scheme. Note that 5 units of energy must be reserved for 
phase 2 of the algorithm, so the power constraint Si for the first phase of the algorithm is niSi = nS — 5. For 
any fixed rate R < C{S), we will find that the remaining n2 = n — rii time units is a linearly increasing function 
of n and yields an error probability upper bounded by l/gn^+ii^). 



A. The finite-bandwidth case 

For the finite-bandwidth case, we assume an overall block length n = ni + n2, an overall power constraint S, 
and an overall rate R = (In M)/n. The overall energy available for phase 1 is at least nS — 5, so the average 
power in phase 1 is at least {nS — 5)/ni. 

We observed that the distance (fni-i between adjacent signal points, assuming that signal and noise are normalized 
to unit noise variance, is twice the parameter 7„j given in (fT6l ). Rewriting ([T6l ) for the power constraint {nS — 5)/ni, 



dm > 2^3 ( 1 + 



nS 



ni 



2^3 (1 + 

nS 



I \ "1/2 

I exp(— ni?) 

nij 

exp(— ni?) ( 1 



6 



ni/2 



> 2V3 ( 1 



+ 



ni 



ni/2 



exp(— ni?) ( 1 



nS + ni ^ 



l + ni/6 



(38) 



where to get (a) we assumed that nS > 6. We can also show that the multiplicative term, (1 — i+ni/e ^"^^^' ^ 
decreasing function of ni satisfying 



1 



1 



l + ni/6 



ni/2 



> lim 1 



1 



l + ni/6 



ni/2 



This establishes (1381 ). In order to satisfy dn^ > 4, it suffices for the right-hand side of (I38b to be greater than or 
equal to 4. Letting u = ni/n, this condition can be rewritten as 



exp 



n 



-i?+-ln(l + -l > 



2ef, 



(39) 



Define </)(z/) by 



(Piu) = -ln(l + S/z.). 



This is a concave increasing function for < < 1 and can be interpreted as the capacity of the given channel 
if the number of available degrees of freedom is reduced from n to vn without changing the available energy per 
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block, i.e., it can be interpreted as the capacity of a continuous time channel whose bandwidth has been reduced 
by a factor of u. We can then rewrite ( [39l ) as 



n 



(40) 



where j3 = In(^). This is interpreted in Figure |3] 







C 

R + P/n 
R 
















V 



-\R) 



Fig. 3. This shows the function (f){y) and also the value of v, denoted (j) ^{R), at which (p{i') = R. It also shows f^, which 
satisfies = R + P/n, and gives the solution to (l40l l with equality. It turns out to be more convenient to satisfy (l40b with 

inequality using j/„, which by simple geometry satisfies Vn — 4>~^{R) + ^'^^n{c-i^'^^ ■ 

The condition dn^ > 4 is satisfied by choosing ni = \nvn\ for Vn defined in Figure [3j i.e., 



Thus the duration 77-2 of phase 2 can be chosen to be 



n2 



This shows that n2 increases linearly with n at rate 1 
the error probability is upper bounded as 



C-R 



(3{i-rHR)) 

C-R 

i>~^{R) for n > (5/{C — R). As a result of lemma 



(41) 



Pr(m/m) < l/5n.+i(2), 



(42) 



Thus the probability of error is bounded by an exponential order that increases at a rate 1 — (j)~^{R). We later 
derive a lower bound to error probability which has this same rate of increase for the exponential order of error 
probability. 



B. The broadband case - zero error probability 

The broadband case is somewhat simpler since an unlimited number of degrees of freedom are available. For 
phase 1, we start with equation (l24l ). modified by the fact that 5 units of energy must be reserved for phase 2. 



exp 
exp 



ni , VT 6 , 

— ln(l H 

2 ^ ni ni 



TRa 



VT 



4ni 



TR,. 



where, in order to get the inequaUty in the second step, we assumed that VT > 6 and used the identity ln(l + x) > 
X — 12. As in the broadband SK analysis, we assume that n\ is increasing quadratically with increasing T. Then 
becomes just a constant. Specifically if n\ > ^ ^ we get. 



dn, >^exp[r(Coo-i?oo)], 



It follows that dn^ > 4 if 



T > 



4+ln2-0.51n3 



(43) 



If (1431) is satisfied, then phase 2 can be carried out for arbitrarily large n2, with Pe satisfying (|42]) . In principle, n2 
can be infinite, so Pg becomes whenever T is large enough to satisfy(l43l). 
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One might object that the transmitter sequence is not well defined with 712 = 00, but in fact it is, since at most 
a finite number of transmitted symbols can be nonzero. One might also object that it is impossible to obtain an 
infinite number of ideal feedback signals in finite time. This objection is certainly valid, but the entire idea of ideal 
feedback with infinite bandwidth is unrealistic. Perhaps a more comfortable way to express this result is that is 
the greatest lower bound to error probability when ( |43l ) is satisfied, i.e., any desired error probability, no matter 
how small is achievable if the continuous-time block length T satisfies ( |43l ). 

VI. A LOWER BOUND TO ERROR PROBABILITY 

The previous sections have derived upper bounds to the probability of decoding error for data transmission using 
particular block coding schemes with ideal feedback. These schemes are non-optimal, with the non-optimalities 
chosen both for analytical convenience and for algorithmic simplicity. It appears that the optimal strategy is quite 
complicated and probably not very interesting. For example, even with a block length n = I, and a message set size 
M = 4, PAM with equi-spaced messages is neither optimal in the sense of minimizing average error probability 
over the message set (see Exercise 6.3 of [6]) nor in the sense of minimizing the error probability of the worst 
message. Aside from this rather unimportant non-optimality, the SK scheme is also non-optimal in ignoring the 
discrete nature of the signal until the final decision. Finally, the improved algorithm of Section |V] is non-optimal 
both in using ML rather than maximum a posteriori probability (MAP) for the tentative decisions and in not 
optimizing the choice of signal points as a function of the prior received signals. 

The most important open question, in light of the extraordinarily rapid decrease of error probability with block 
length for the finite bandwidth case, is whether any strictly positive lower bound to error probability exists for fixed 
block length n. To demonstrate that there is such a positive lower bound we first derive a lower bound to error 
probability for the special case of a message set of size M = 2. Then we generalize this to codes of arbitrary rate 
and show that for R < C, the lower bound decreases as a kth order exponential where k increases with the block 
length n and has the form k = an — b' where the coefficient a is the same as that in the upper bound in Section 
rvl It is more convenient in this section to number the successive signals from 1 to n rather than to n — 1 as in 
previous sections. 

A. A lower bound for M = 2 

Although it is difficult to find and evaluate the entire optimal code, even for M = 2, it turns out to be easy to find 
the optimal encoding in the last step. Thus, for each F""^, we want to find the optimal choice of Xn = f{U, Y^~^) 
as a function of, first, the encoding functions Xi = f{U,Y\~^), 1 < i < n — 1, and, second, the allocation of 
energy, S = E[Xl\Y'l-'^] for that y^"\ We will evaluate the error probability for such an optimal encoding at time 
n and then relate it to the error probability that would have resulted from decoding at time n — 1. We will use this 
relation to develop a recursive lower bound to error probability at each time i in terms of that at time i — 1. 

For a given code function Xi = f{U, Y\~^) for 1 < i < n — 1, the conditional probability densit}0 of Y\ given 
[/ = 1 or 2 is positive for all sample values for Y\ ; thus the corresponding conditional probabilities of hypotheses 
U = 1 and [/ = 2 are positive i.e. 

Pr{U=m\Y\) > m G {1, 2}, W\ £ R\ 

In particular, for m S {1,2}, define = Pr([/=m|y"~^) for some given y"~^. Finding the error probability 
^ = Pr(f7(y^) 7^ U I y""^) is an elementary binary detection problem for the given y"~^. MAP detection, using 
the a priori probabilities and <I>2, minimizes the resulting error probability. 

For a given sample value of y"~^, let bi and 62 be the values of Xn for [/ = 1 and 2 respectively. Let a be half 
the distance between 61 and 62, i-^-, 2a = 62 — ^i- The error probability depends on 61 and 62 only through a. 
For a given S, we choose 61 and 62 to satisfy E[X„|y"^^] = 0, thus maximizing a for the given S. The variance 
of Xn conditional on y"~^ is given by 

Var(X„|yri) = IY1 - ^i)' = 4$l^2a^ 

id 

"We do not use the value of this density, but for completeness, it can be seen to be 11}=! ~ fiU, Y{^^)] where ^{x) is the normal 
density (27r)"^/^ exp(-a;V2). 
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and since E[X„|y"~^] = 0, this means that a is related to 5 by S* = 4$i$2a^- 

Now let $ = min{<I>i, <I>2}- Note that <I> is the probability of error for a hypothetical MAP decoder detecting U 
at time n — 1 from F"^^. The error probability ^ for the MAP decoder at the end of time n is given by the classic 
result of binary MAP detection with a priori probabilities <I> and 1 — 

It)' 

where rj = and Q{x) = /^(27r)~^/^ exp(— dz. This equation relates the error probability ^ at the end 
of time n to the error probability <I> at the end of time n — 1, both conditional on Y^~^. We are now going to view 
^' and <I> as functions of Y^~^, and thus as random variables. Similarly 5 > can be any non-negative function 
of y"^^, subject to a constraint 5„ on its mean; so we can view S as an arbitrary non-negative random variable 
with mean Sn- For each S and ^> determine the value of a; thus a is also a non-negative random variable. 

We are now going to lower bound the expected value of in such a way that the result is a function only of 
the expected value of $ and the expected value Sn of S. Note that in (|44b can be lower bounded by ignoring 
the first term and replacing the second term with ^Q{a). Thus, 




(45) 



where the last step uses the facts that Q{x) is a decreasing function of x and that 1 — <I> > 1/2. 



> E[$]Q 




(46) 



(47) 



(48) 

In (l46l) . we used Jensen's inequality, based on the facts that Q{x) is a convex function for x > and that <I>/E[<I>] 
is a probability distribution on F""^. In (|47l ). we used the Schwarz inequality along with the fact that Q{x) is 
decreasing for x > 0. 

We now recognize that E[^'] is simply the overall error probability at the end of time n and E[<1>] is the overall 
error probability (if a MAP decision were made) at the end of time n—1. Thus we denote these quantities as pn 
and pn-i respectively, 



Pn > Pn-lQ 



I Sn 
'iPn^ 



(49) 



Note that this lower bound is monotone increasing in pn-i- Thus we can further lower bound pn by lower 
bounding Pn-i- We can lower bound Pn-i (for a given Pn-2 and Sn-i) in exactly the same way, so that Pn-i > 
Pn-2Q{\/ Sn-i/2pn-2)- Thcsc two bounds can be combined to implicitly bound pn in terms of Pn-2, Sn and 5„_i. 
In fact, the same technique can be used for each i,l < i < n, getting 



Pi > Pi-iQ 



1^ 

2pi- 



(50) 
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This gives us a recursive lower bound on p„ for any given choice of Si, . . . ,Sn subject to the power constraint 

Si < nS. 

We have been unable to find a clean way to optimize this over the choice of ^i, . . . , 5„, so as a very crude 
lower bound on pn, we upper bound each Si by nS. For convenience, multiply each side of (l50l ) by 2/nS, 

2p,, ^ 2p^^ _ for 1 < . < n. (51) 



nS - nS ^ V V 2p. 

At this point, we can see what is happening in this lower bound. As pi approaches 0, ^ ^ oo. Also Q 

approaches as e . Now we will lower bound the expression on the right hand side of dSTT ). We can check 
numericall^o that for x > 9, 

1 

-QiVx) > exp(-x). (52) 

X 

Furthermore ^Q{^/x) is decreasing in x for all x > 0, and thus 

—Q(V^) > exp(— maxix, 9}) Vx > 0. 

X 



Substituting this into (I5TI ) we get. 



— - > ^ ; for 1 < z < n. 

nS exp (max{ 2^;^ , 9} ) 

Applying this recursively for i = n down to z = A: + 1 for any A; > we get, 

2p„. ^ 1 



nS exp(max{exp(max{2^^,9}),9}) 

(^) 1 

exp(exp(max{2^,9})) 
1 

> 



gn-k 



max 



(53) 



where (a) simply follows from the fact that exp(9) > 9. This bound holds for A; = 0, giving an overall lower bound 
on error probability in terms of pQ. In the usual case where the symbols are initially equiprobable, po = 1/2 and 

Pn > ^ r "'^ C nM • ^^'^^ 

2gn[max[nS,9)\ 

Note that this lower bound is an nth order exponential. Although it is numerically much smaller than the upper 
bound in Section |Vl it has the same general form. The intuitive interpretation is also similar. In going from block 
length n — 1 to n, with very small error probabili ty at n — 1, the symbol of large a priori probability is very close 

to and the other symbol is approximately at \J^Jpn-~i- Thus the error probability is decreased in one time unit 
by an exponential in Pn-i, leading to an nth order exponential over n time units. 



B. Lower bound for arbitrary M 

Next consider feedback codes of arbitrary rate R < C with sufficiently large blocklength n and M = e^^ 
codewords. We derive a lower bound on error probability by splitting n into an initial segment of length rii and 
a final segment of length n2 = n — ni. This segmentation is for bounding purposes only and does not restrict the 
feedback code. The error probability of a hypothetical MAP decoder at the end of the first segment, Pe(ni), can 
be lower bounded by a conventional use of the Fano inequality. We will show how to use this error probability as 
the input of the lower bound for M = 2 case derived in the previous subsection, i.e., equation (1531 ). There is still 
the question of allocating power between the two segments, and since we are deriving a lower bound, we simply 

'^That is, we can check numerically that i52i is satisfied for x = 9 and verify that the right-hand side is decreasing faster than the left 
for a: > 9. 
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assume that the entire available energy is available in the first segment, and can be reused in the second segment. 
We will find that the resulting lower bound has the same form as the upper bound in Section |Vl 

Using energy Sn over the first segment corresponds to power Sn/ni, and since feedback does not increase the 
channel capacity, the average directed mutual information over the first segment is at most niC{Sn/ni). Reusing 
the definitions u = rii/n and (^(z^) = | ln(l + ^) from Section Ivl 

niC{Sn/ni) = n(j){i>). 

The entropy of the source is In M = nR, and thus the conditional entropy of the source given y"^ satisfies 

n[R-(l){iy)] < HiUlY'l') 



< 
< 
< 



h{Pe{ni)) + Pe{ni)nR 
ln2 + Pe{ni)nR, 



-plnp 



where we have used the Fano inequality and then bounded the binary entropy h{p) 
by In 2. 

To use dSSl ) as a lower bound on P(,{ni), it is necessary for ni = nu to be small enough that 
less than R, and to be specific we choose v to satisfy 



R 



M > - 
n 



With this restriction, it can be seen from (1551 ) that 



1 -ln2 
nR 



(55) 

-(l-p)ln(l-p) 
(v) is substantially 

(56) 



(57) 



Figure |4] illustrates that the following choice of ni in (1581 ) satisfies both equation (1561 ) and equation (l57l ). This uses 
the fact that 0(zv) is a monotonically increasing concave function of u. 



ni 



n<j)-\R) 



1 



C-R 



(58) 







C 






R 

R-l/n 




















v 





V'^ 



Fig. 4. This sliows tlie value of denoted (\) ^{R), at wiiich <^(i^) = R. It also shows v!^, where <j){u'„) = R ~ 1/n. This gives the 
solution to ([56} with equality, but Un — (j}~^{R) — '^'(c-jq can be seen to be less than v'^ and thus also satisfies J56t . 



The corresponding choice for n2 is 



n2 



n[l-(t)-'{R)] + 



1 



\R) 



C-R 



(59) 



Thus with this choice of ni,n2, the error probability at the end of time ni satisfies ( [57] ). 

The straightforward approach at this point would be to generalize the recursive relationship in ([50b to arbitrary 
M. This recursive relationship could then be used, starting at time i = n and using each successively smaller i 
until terminating the recursion at i = ni where ([57] ) can be used. It is simpler, however, since we have already 
derived ([50] ) for M = 2, to define a binary coding scheme from any given M-ary scheme in such a way that the 
binary results can be used to lower bound the Af-ary results. This technique is similar to one used earlier in [1]. 

Let Xi = f{U, Y\-^) for 1 < i < n be any given coding function for [/ £ M. = {1, . . . , M}. That code is used 
to define a related binary code. In particular, for each received sequence F"^ over the first segment, we partition 
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the message set Ai into two subsets, M.i{Y^^) and ^2(^1^). The particular partition for each F"^ is defined later. 
This partitioning defines a binary random variable V as follows, 

2 UeM2{Y'l'] 



V 



At the end of the transmission, the receiver will use its decoder to decide U. We define the decoder for V at time 
n, using the decoder of U as follows. 



V 



1 UeMiiri'] 

Note that with the above mentioned definitions, whenever the Af-ary scheme decodes correctly, the related binary 
scheme does also, and thus the eiTor probability Pe{n) for the M-ary scheme must be greater than or equal to the 
error probability p„ of the related binary scheme. 

The binary scheme, however, is one way (perhaps somewhat bizarre) of transmitting a binary symbol, and thus 
it satisfies the result^ of section |VI-A[ In particular, for the binary scheme, the error probability p„ at time n is 
lower bounded by the error probabiUty at time ni by (1531 ). 



Pe{n) >Pn> 



nS 



2 

9n 



_nS_ g 



(60) 



Our final task is to relate the error probability at time ni for the binary scheme to the error probability Pe{ni) 
in (l57l ) for the M-ary scheme. In order to do this, let ^rn{Y^^) be the probability of message m conditional on 
the received first segment The MAP error probability for an M-ary decision at time ni, conditional on y"\ 
is 1 - $max(i'i') where ^'maxll'i') = max{$i (F^^ ) , . . . <^m{Y'1')}. Thus Peim), given in §1}, is the mean of 
l-^max(0 over 

Now is the mean, over F"^ , of the error probability of a hypothetical MAP decoder for V at time ni conditional 
on p„j(y"^). This is the smaller of the a posteriori probabilities of the subsets M-i, M.2 conditional on y"\ 
i.e., 

p„,(y^^)=mini j;ci>„(y^^), ^$^(y^^)i (6i) 

The following lemma shows that by an appropriate choice of partition for each y"\ this binary error probability 
is lower bounded by 1/2 the corresponding M-ary error probability. 

Lemma 6.1: For any probability distribution <I>i, . . . , <I>m on a message set M. with M > 2, let <I>max = 
max{<I>i, . . . , ^m}. Then there is a partition of M. into two subsets, M.i and M.2 such that 

^ ^ ■'^ ~ ^max , \ ^ T ^ 1 ~ *^*max //ton 

$m > ^ and 2^ $m > . (62) 



Proof: 

Order the messages in order of decreasing <I>m- Assign the messages one by one in this order to the sets A4i 
and Ai2- When assigning the kth most likely message, we calculate the total probability of the messages that 
have already been assigned to each set, and assign the kth message to the set which has the smaller probability 
mass. If the probability mass of the sets are the same we choose one of the sets arbitrarily. With such a procedure, 
the difference in the probabilities of the sets, as they evolve, never exceeds $max- After all messages have been 
assigned, let 

<!>[= ^ <^m; '^'2= 

We have seen that |^>'^ - ^>2| < ^>max- Since + = 1, dm follows. ■ 

"This is not quite as obvious as it sounds. The binary scheme here is not characterized by a coding function f{y,Y^^^) as in Section 
I VI- A| ■ but rather is a randomized binary scheme. That is, for a given F"^ and a given choice of V, the subsequent transmitted symbols Xi 
are functions not only of V and F^^^, but also of a random choice of U conditional on V. The basic conclusion of l lSOt is then justified by 
averaging over both Fj^^ and the choice of U conditional on V . 
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Since the error probability for the binary scheme is now at least one half of that for the M -ary scheme for each 
we can take the mean over y"\ getting > Pe{ni)/2. Combining this with (l60l ) and (l57l) 

where 712 is given in ( [59l) . The exact terms in this expression are not particularly interesting because of the very 
weak bounds on energy at each channel use. What is interesting is that the order of exponent in both the upper 
bound of (|42)) and (|4T]) and the lower bound here are increasing Unearl}0 at the same rate 1 — (p^^{R). 

VII. Conclusions 

The SK data transmission scheme can be viewed as ordinary PAM combined with the Elias scheme for noise 
reduction. The SK scheme can also be improved by incorporating the PAM structure into the transmission of the 
error in the receiver's estimate of the message, particularly during the latter stages. For the bandlimited version, 
this leads to an error probability that decreases with an exponential order an + b where a = 1 — (j)'^{R) and b is 
a constant. In the broadband version, the error probability is zero for sufficiently large finite constraint durations 
r. A lower bound to error probability, valid for all i? < C was derived. This lower bound also decreases with an 
exponential order an + b'{n) where again a = I — (p^^{R) and b'{n) is essentially a constant^ It is interesting 
to observe that the strategy yielding the upper bound uses almost all the available energy in the first phase, using 
at most 5 units of energy in the second phase. The lower bound relaxed the energy constraint, allowing all the 
allowable energy to be used in the first phase and then to be used repeatedly in each time unit of the second 
phase. The fact that both bounds decrease with the same exponential order suggests that the energy available for 
the second phase is not of primary importance. An open theoretical question is the minimum overall energy under 
which the error probability for two code words can be zero in the infinite bandwidth case. 
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